Fusion of Speech, Faces and Text for Person Identification in TV Broadcast
نویسندگان
چکیده
The Repere challenge is a project aiming at the evaluation of systems for supervised and unsupervised multimodal recognition of people in TV broadcast. In this paper, we describe, evaluate and discuss QCompere consortium submissions to the 2012 Repere evaluation campaign dry-run. Speaker identification (and face recognition) can be greatly improved when combined with name detection through video optical character recognition. Moreover, we show that unsupervised multimodal person recognition systems can achieve performance nearly as good as supervised monomodal ones (with several hundreds of identity models).
منابع مشابه
Person name recognition and linking from overlay text in TV broadcast shows
Identifying people in video broadcast is by nature a multimodal task: persons can be identified thanks to biometric information (face or voice), or thanks to a reference to their identity in the overlaid text or the speech content. In the framework of the French evaluation program Repere, this paper presents a method for identifying speakers in videos without any a-priori models, based only on ...
متن کاملMultimodal understanding for person recognition in video broadcasts
This paper describes a multi-modal person recognition system for video broadcast developed for participating in the DefiRepere challenge. The main track of this challenge targets the identification of all persons occurring in a video either in the audio modality (speakers) or the image modality (faces). This system is developed by the PERCOL team involving 4 research labs in France and was rank...
متن کاملUPC System for the 2015 MediaEval Multimodal Person Discovery in Broadcast TV task
This paper describes a system to identify people in broadcast TV shows in a purely unsupervised manner. The system outputs the identity of people that appear, talk and can be identified by using information appearing in the show (in our case, text with person names). Three types of monomodal technologies are used: speech diarization, video diarization and text detection / named entity recogniti...
متن کاملReliability based budgeting with the case study of TV broadcast
Planning budget will help to identify wasteful expenditures, adapt financial situation changes quickly, and achieve financial goals. The reliability based budgeting has a great importance for broadcasting industry. In this study, several kinds of failure modes in TV broadcasting system have been det...
متن کاملStudy on Unit-Selection and Statistical Parametric Speech Synthesis Techniques
One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012